What is this project about?

The Supreme Court of the United States is one of the most powerful organs of American government. What makes it different from other institutions - and what makes it fascinating - is that the actions of this body depend entirely on the opinions of a handful of individuals who hold their jobs for decades and perform their duties in public. No other government department is so consistently personal. Anyone who wants to understand the court needs to understand the tendencies of its members.

In this project, I study the transcripts of oral arguments before the court and attempt to gain some insight into the justices. To limit the scope of this project, I focus only on the interactions between the justices and the petitioners. (The petitioner is the party who requests that the court hear the case.) I also choose to focus only on cases argued in the 2019 session, and I exclude two cases where it is difficult to cleanly assign parties to the roles of petitioner and respondent (dockets 18-1323 and 18-1334).

How does it work?

Data Collection

The project uses two data sources:

First, data is collected from the PDFs and stored in a table. Then, I join the table of voting records based on justice names and docket numbers.

At a high level, the PDF mining process works like this:
1. Open one case PDF
2. Extract the relevant section from the document - oral argument of the petitioner
3. Extract the text for one justice who speaks in that section
4. Analyze the extracted text for the justice (sentiment analysis + count of words, questions, interruptions)
5. Create a single row of data for the analysis of the justice in this case
6. Repeat for every justice in the case
7. Repeat for every case PDF from 2019

Automated Text Analysis

There are four components of text analysis which happen when building the table:

  • sentiment analysis
    • sentence-based sentiment analysis with the sentimentr package
    • unigram-based sentiment analysis with the afinn lexicon
  • count the number of questions the justice asked the petitioner
  • count the number of times the justice interrupted the petitioner
  • count the number of words spoken during Q&A with the petitioner

Further manual analysis is done after building the table.

More on Sentiment Scores

For every case, I compute a mean sentiment score for the justices’ speech with both methods. I chose these two methods because they both produce numerical scores, which allows us to compare the degree of positivity/negativity in different circumstances. The sentimentr method allows for calculating sentiment on a whole sentence, which has the potential to uncover critical differences in meaning due to negation of positive/negative terms (e.g. “That’s not good news”).

Mean sentiment scores are calculated like this:

  • sentimentr: weighted mean of sentiment score for all sentences, with weights determined by the number of words in each sentence
  • afinn: remove stop words from text, take mean value of remaining words which have a sentiment score in the afinn lexicon

More on Detecting Interruptions

It’s possible to detect interruptions automatically because court stenographers have an extremely detailed and consistent way of recording everything vocalized during a trial. They don’t just record words; they also record pauses, stutters, and abrupt stops committed by the speakers. In our text data, any one of these is represented by two dashes. We can detect interruptions as lines that end with an abrupt stop, followed by text from another speaker. Consider the example below:

Mr. Gant’s line ends with -- and is followed by Justice Kavanaugh starting a new line. This is an interruption by Kavanaugh. After cleaning the text (removing line numbers, excess spaces, etc.) we can detect interruptions by Kavanaugh with a regular expression like this: "-- JUSTICE KAVANAUGH".

This process is straightforward thanks to the extremely standardized nature of the court reporting: abrupt stops are always represented with two dashes, names are written in all caps when there is a change of speaker, and speaker names are always written the same way within a document. We don’t have to worry about accidentally counting a stutter as an interruption, because a stutter isn’t followed by JUSTICE NAME in all caps in the dialogue.

To handle this task automatically, I use a different regular expression generated for each justice.

Findings

Word Count

Word Count over Time

The chart below shows the number of words spoken by each justice (during the petitioners’ arguments) in cases through the 2019 session. There aren’t many interesting patterns here, except for Chief Justice Roberts suddenly getting a bit more talkative at the end of the session. Other than that, nothing to see –

Wait a minute, what’s up with Justice Thomas?

Before we go any further into the analysis, I have to address one thing: this is not a mistake. There was no issue parsing the PDFs, there’s nothing wrong with the data. Justice Clarence Thomas simply doesn’t speak very often. In fact, he once went 10 years without asking a question. He broke that streak in 2016, but he still doesn’t chime in much - except during the pandemic.

Of the 56 arguments analyzed, Thomas spoke during 10 of them. I chose not to recode his NA values with 0s - I think the absence of dots on the graph makes the point better.

Word Count by Vote Type

Can we gauge which way a justice is leaning based on how much they talk to the petitioner? It depends on the justice.

Most justices spoke more in cases where they eventually voted against the petitioner. Elena Kagan is the only exception. For Kavanaugh and Alito, the difference is quite pronounced. Alito can really go on a rant when he doesn’t buy the petitioner’s argument.

Questions

Do they justices ask more questions in cases where they eventually vote against the petitioner?

Yes, according to a quick t-test in R: the estimated average number of questions from a justice who votes FOR the petitioner is 3.58 compared to 4.48 from a justice who votes AGAINST, with a p-value of 0.01. But that’s if we consider all justices together. Take a look at the individual justices below:

The difference is quite meaningful for Alito and Gorsuch, but more subtle for the other justices. Breyer and Sotomayor aren’t tipping their hands based on how many questions they ask.

If you take a look at the table below, you’ll see that the average number of questions asked is significantly different only for Alito, Gorsuch, and Kavanaugh. This table was generated by running a t-test on each justice individually.

Table: Average Number of Questions by Vote Type
Justice Voted for Petitioner Voted against Petitioner Statistical Significance P-Value
JUSTICE ALITO 3.79 6.89 Significant 0.003
JUSTICE GORSUCH 3.71 7.05 Significant 0.019
JUSTICE KAVANAUGH 2.42 3.83 Significant 0.021
JUSTICE GINSBURG 2.46 3.19 Not Significant 0.277
JUSTICE SOTOMAYOR 5.40 4.26 Not Significant 0.317
JUSTICE KAGAN 4.74 3.93 Not Significant 0.431
JUSTICE BREYER 5.00 5.59 Not Significant 0.618
CHIEF JUSTICE ROBERTS 2.30 2.16 Not Significant 0.835
JUSTICE THOMAS 2.50 2.67 Not Significant 0.894

Interruptions

Conventional knowledge might say “people who agree with you don’t cut you off when you are speaking”. Based on the plots below, this might not be entirely true of Supreme Court Justices. They are just opinionated folks who want to get their word in.

Ginsburg and Sotomayor both had occasions where they committed high numbers of interruptions in cases where they sided with the petitioner.

Gorsuch exhibits an obvious pattern of interrupting more often in cases where he disagrees with the petitioner, but for the other justices, the conditional means are too close to be worth t-testing.

Sentiment

Different Methods, Different Scores

I calculated sentiment scores on complete sentences with the sentimentr package, and on unigrams (single words) with the afinn lexicon. I found that these two approaches resulted in significantly different interpretations of the justices’ speech.

The two methods have different numerical scales, but they share a common principal: zero is neutral, positive numbers are “positive” emotions, negative numbers are “negative” emotions. Now, take a look at the score distributions in our data:

The afinn scores lean very slightly negative, while the sentimentr scores are notably positive on average.

These two methods are different enough that they’re not even really correlated. I standardized their scores (by calculating z-scores) and plotted them against each other:

Among the cases which have sentiment scores closer to the mean, there appears to be some mild positive correlation between the two scoring systems. But as you move towards the extreme cases - starting at even one standard deviation away from the mean on either axis - the relationship totally falls apart. The upper-left and lower-right quadrants provide us a few examples of texts where they couldn’t even agree on the polarity, much less the intensity, of the sentiment. (Perhaps this is due to sentimentr recognizing negation, which isn’t possible when analyzing unigrams…)

I decided to proceed with the sentiment analysis using the afinn lexicon on unigrams. The sentimentr package is a great solution for other applications, but it seems to have an unusually positive bias on this particular data.

Sentiment Distribution by Justice

Let’s take a look at the overall sentiment distributions for the individual justices in the 2019 session. If you blur your eyes and take a few steps back from your monitor, these distributions look surprisingly normal-ish. No justice is skewed drastically to the left or right, though there are some apparent tails.

Gorsuch has an interesting pair of peaks in his distribution, with a depression around the neutral zero line. More on that in the next section.

Do you notice any pattern regarding which justices lean to the left of the neutral line?

According to my eyes, it’s justices Breyer, Ginsburg, Kagan, and Sotomayor. But no need to trust my eyes, consider the table below:

Table: Justices Ranked by Mean Sentiment Score
Justice Mean Sentiment Score
JUSTICE KAGAN -0.8430199
JUSTICE GINSBURG -0.7032573
JUSTICE BREYER -0.5167652
JUSTICE SOTOMAYOR -0.4893529
JUSTICE ALITO -0.3261785
JUSTICE GORSUCH -0.1086652
CHIEF JUSTICE ROBERTS -0.0702282
JUSTICE THOMAS 0.0166667
JUSTICE KAVANAUGH 0.1841721

Interesting. Breyer, Ginsburg, Kagan, and Sotomayor represent the liberal wing of the court. They sit at the top of this table for being the most negative, on average. The conservative justices, on the other, are lumped together at the bottom of this table. (If the pattern were perfect, Alito and Roberts would switch places, though.)

Why might this be? It could be a result of the particular cases brought before the court this year. Donald Trump and Attorney General William Barr were party to a handful of cases in this session. Or it could be that these four justices have banded together and gotten defensive in a court where conservatives outnumber them.

Sentiment Distribution by Justice, Conditioned by Vote Type

When we break down the distribution by vote type, we see a similar pattern as we did with the other metrics we studied: some justices exhibit obviously different patterns based on their opinion of the petitioner’s argument, while others are a bit harder to read.

For Gorsuch, his conditional distributions are nearly mirrored images of each other: there is one distribution which describes his sentiment for arguments he agrees with, and flipped one for arguments he doesn’t agree with. We can say the same for Sotomayor. For Sotomayor, we should also note that there is no overlap at the extreme ends of the spectrum. In her most negative cases, she always voted against the petitioner. In her most positive cases, she always voted for the petitioner.

Ginsburg, Kagan, and Thomas have easy-to-read, incongruent distributions based on their eventual vote type.

Alito and Breyer are a bit more even-handed, at least in this regard. Perhaps that’s easy when you are always slightly grumpy.

Chief Justice Roberts has a distribution that makes you look twice. Yes, the mean sentiment score is positive when voting against the petitioner, and negative when voting for the petitioner. (Remember, this analysis only analyzes the interaction between the justices and petitioners - the other side of the cases, the respondents, is excluded. So this seems very odd!)

I will speculate about why his sentiment distribution seems out of line with his votes:

Roberts has a tough job. He understands that he is often the “swing vote” in cases decided by this court. He knows that people see the Supreme Court as political, and he doesn’t like that. Additionally, as the chief justice, he has a certain duty to lead the court - and to lead courtroom discussions - in an unbiased way. For these reasons, I think we may not always see a candid display of his thoughts during oral arguments, but rather an attempt to appear neutral and handle both sides of the case fairly. If he already has an opinion during oral arguments, he may be trying to conceal it.

Vocabulary Analysis

Ginsburg & Gorsuch: Top Words by Vote Type

I wanted to see if there were any interesting differences in the vocabularies of the justices. To start off, I simply counted the number of times that the justices said different words, and ranked them from most frequent to least frequent. This was not interesting; it turns out that all Supreme Justices say the words “person” and “law”. So, I added a few more terms to my custom stop word list (stop words are words excluded from the analysis), and I decided to condition these word rankings on vote type.

Take a look at that words used by Ginsburg and Gorsuch:

These charts provide a little insight about what is important to the justices. Looking at the blue bars for Ginsburg, we can see that she talks about gender and equality - no surprise given her outstanding achievements in this domain. For Gorsuch, we spot some bedrock conservative themes: religion, speech, and the constitution (“amendment”).

We also notice that one of the top words spoken by Gorsuch in cases where he voted for the petitioner is “yeah”. This isn’t an ideological insight, but it’s an interesting pattern for anyone who wants to predict how he will vote.

Among the orange bars, we see an interesting point of commonality between two very different jurists: words about the government, the courts, and the legal process.

TF-IDF for all Justices

Term frequency-inverse document frequency (TF-IDF) is a measure of how important one word is to a single document after considering the presence of that word across a whole collection of documents. A word that is rare in general but appears frequently in one document and not others will have a high TF-IDF; a word that is common and appears frequently in the whole collection of documents will have a low TF-IDF. In this way, TF-IDF helps us identify the distinguishing words associated with a document.

Below, we treat each justice’s speech across the whole 2019 session as a separate document. (For example: all the words Roberts said during oral arguments this year is treated as one big document, all the words Thomas said is a separate document, all the words that Sotomayor said is another document, etc.)

What words distinguish the justices from one another?

Here, we only gain less insight about the justices’ beliefs, and we learn more about their mannerisms. Justice Robert’s speech is distinguished by the word “briefly”; probably because he is directing the trial (“Briefly, counsel.”). Justice Sotomayor says “concealing” a lot more than other justices; perhaps she thinks that the lawyers are hiding the true nature of their arguments. Justice Breyer is unique in saying… “dah”?

Not a text-parsing error. Here are some snippets of Breyer using the term “dah-dah-dah”:

We can see why this word scored such a high TF-IDF: it is highly unique to Justice Breyer. If you gave me an unlabeled transcript of Supreme Court arguments and asked me to find a paragraph where Breyer was speaking, the first thing I would do is search for the term “dah”.

TF-IDF for Roberts, Conditioned on Vote Type

If we focus on a single justice, and treat their speech prior to voting for and prior to voting against the petitioner as two separate documents, can we identify any terms that are unique to their speech based on how they will vote later?

Let’s try it on Chief Justice Roberts, who is often seen as the swing vote:

We have a mix of signal and noise. Some of these terms are informative, but some of them are likely particular to the cases which came before the court in this session (such as “groundwater”).

Words like “claim” and “theory” are often used when distancing one’s self from another party’s beliefs. “Frivolous” is not a word you want to hear come out of a judge’s mouth during your oral arguments. We can see why these unigrams distinguish the voted against text from the voted for text.

Among the blue columns, I would like to draw your attention to the word “yeah”. This is something we saw previously with Gorsuch: a mannerism of saying “yeah” during trials when he agrees with the petitioner’s argument.

Wrap Up

So, what’s it good for?

This project is about studying the speaking habits of the justices during oral arguments. It could be the starting point for building a predictive model for case outcomes (as others have done), but the primary purpose is to gain a deeper understanding of the human beings on the court.

Here is a short, non-exhaustive, non-scientific recap of things we learned about the justices:

CHIEF JUSTICE ROBERTS

  • sentiment behind his words might not match his real beliefs
  • often instructs lawyers to argue their points “briefly”
  • says “claim” and “theory” when he disagrees with your argument, “yeah” when he buys it

JUSTICE THOMAS

  • doesn’t talk much

JUSTICE BREYER

  • talks a lot
  • says “dah-dah-dah” in the middle of his hypothetical examples

JUSTICE ALITO

  • talks a lot more when he disagrees with you
  • ask more questions when he disagrees with you

JUSTICE SOTOMAYOR

  • might interrupt you a lot; doesn’t mean she is against your argument
  • sentiment behind her words matches her beliefs pretty well, especially if the sentiment is extreme

JUSTICE KAGAN

  • she’s the only justice who talks more when she agrees with you
  • sentiment is negative on average, and extra negative when she disagrees with you

JUSTICE GORSUCH

  • asks more questions when he disagrees with you
  • interrupts more when he disagrees with you
  • says “yeah” when he agrees with you, talks about the government and courts when he doesn’t

JUSTICE KAVANAUGH

  • quiet when he agrees with you, moderately prolix when he doesn’t
  • asks more questions when he disagrees with you

JUSTICE GINSBURG

  • commonly negative in sentiment, and even more so when she disagrees with you
  • interested in topics concerning equality
  • talks about government and courts when she doesn’t agree with you